Learning Language Identification Models: A Comparative Analysis of the Distinctive Features of Names and Common Words
نویسنده
چکیده
The intuition and basic hypothesis that this paper explores is that names are more characteristic of their language than common words are, and that a single name can have enough clues to con dently identify its language where random text of the same length wouldn't. To test this hypothesis, n-gramm modelling is used to learn language models which identify the language of isolated names and equally short document fragments. As the empirical results corroborate the prior intuition, an explanation is sought for the higher accuracy at which the language of names can be identi ed. The results of the application of these models, as well as the models themselves, are quantitatively and qualitatively analysed and a hypothesis is formed about the explanation of this difference. The conclusions derived are both technologically useful in information extraction or text-to-speech tasks, and theoretically interesting as a tool for improving our understanding of the morphology and phonology of the languages involved in the experiments.
منابع مشابه
A Comparative Genre Analysis of Memorial Cards in English and Funeral/Memorial Announcements in Persian
As covert socio-cultural relations have significant effects on language, these norms are reflected in linguistic and generic structure of public death notices as a distinctive genre. This study intended to identify the different genres of death notices (e.g. memorial advertisements, obituaries, funeral announcements/posters, memorial cards, etc.) and to conduct a comparative genre analysis of m...
متن کاملTranslation of Anthroponyms in Children’s Cartoons: A Comparative Analysis of English Dialogues and Persian Subtitles
The impact of animated cartoons on children has already been emphasized by quite many researchers. The present study aimed to investigate the strategies Iranian subtitlers of English animated cartoons used in re n- dering English anthroponyms in cartoons. To this aim, two theoretical frameworks were employed: Van Coillie's Model of Translating Proper Names and Fernandes’s Model of Proper N...
متن کاملA comparative sociopragmatic analysis of wedding invitations in American and Iranian societies and teaching implications
Wedding invitations (WIs), as a uniquely socially and culturally constructed genre, provide a distinct opportunity to compare the sociocultural values of different speech communities as reflected in the textual content and organization of the different moves. Students can be exposed to this genre and its different moves using a genre-based pedagogy. Genre-based ped...
متن کاملApplication of Proper Nouns as Terms of Address in Russian Compared to their Persian Equivalents
This study delved into the application of proper nouns as terms of address in Russian and Persian. In other words, it examined the rules governing the application of terms of address expressed as the names of individuals in different speech situations in both languages. The comparative study of the cultural features of languages spoken by Russians and Iranians called for the investigation of th...
متن کاملThe Comparative Impact of Pictorial Annotations and Morphological Instruction on Lexical Inferencing of Iranian Intermediate EFL Learners
One of the main ways to acquire unfamiliar words is to make guesses about words meaning. This study investigates the comparative effects of pictorial annotations and morphological instructions on Iranian EFL learners’ lexical inferencing ability. Considering homogeneity issues using PET (Preliminary English Test), the researchers assigned the participants into two experimental and one control g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010